Project-Team:STARS

Inria | Raweb 2015 | Presentation of the Project-Team STARS | STARS Web Site


	PDF	e-Pub

Previous |

Home | Next next

Section: New Results

Robust Global Tracker Based on an Online Estimation of Tracklet Descriptor Reliability

Participants : Thi Lan Anh Nugyen, Chau Duc Phu, François Brémond.

Keywords: Tracklet fusion, Multi-object tracking

Object tracking - the process of locating a moving object (or multiple objects) in one camera or in a camera network over time - is an important part in surveillance video processing. However, the video context variation requires trackers to face plenty of challenges. For example, objects change their movement direction or their appearances, poses; objects are occluded by other objects or background; illumination is changed... In order to overcome above challenges, calculating the object appearance model overtime to adapt tracker to context variation is a necessary work.

In the state of the art, some online learning approaches [52] , [48] have been proposed to track objects in various video scenes in each frame. These approaches learn online discriminative object descriptors to the current background as in [52] or learn an object appearance model which discriminates objects overtime as in [48] . However, the limitation of these approaches is that the reliability of object descriptor computed on only current frame is sensitive. False positives can reduce tracking quality. Furthermore, these algorithms try to find the discriminative descriptors or signatures of one object compared to its neighborhood but not considering to the correlation of this object with its can-match candidates. Meanwhile, global tracking methods [91] , [98] show their dominant ability over previous methods in noisy filtering. The approach in [98] proposes an algorithm that recovers fragmentation of object trajectories by using enhanced covariance-based signatures and an online threshold learning. The approach in [91] proposes a hierarchical relation hypergraph based tracker. These global tracking algorithms have significant results in matching short trajectories and filtering some noise. However, object descriptor weights are fixed for the whole video. Therefore, their tracking performances can be reduced if the scene changes.

Figure 12. PETs2009 dataset: The online computation of discriminative descriptor weights depending on each video scene.

**Table 6.** Tracking performance.The best values are printed in bold, the second best values are printed in italic.
Dataset	Method	MOTA	MOTP	GT	MT	PT	ML	FG
PETS2015	Chau et Al. [53]	–	–	2	0.0	100.0	0.0	2
	Ours ( Proposed approach + [53] )	–	–	2	100.0	0.0	0.0	1
PETS2009	Chau et Al. [53]	0.62	0.63	21	–	–	–	8
	Bae et Al. with all [48]	0.83	0.69	23	100	0	0.0	4
	Zamir et Al. [95]	0.90	0.69	21	–	–	–	–
	Bae *et Al.-global association* [48]**	0.73	0.69	23	100	0	0.0	12
	Badie et Al. [47]	0.90	0.74	21	–	–	–	–
	Badie et Al. [47] + [53]	0.85	0.71	21	66.6	23.9	9.5	6
	Ours ( Proposed approach + [53] )	0.86	0.72	21	76.2	14.3	9.5	4
TUD-Stadtmitte	Milan et Al. [74]	0.71	0.65	9	70.0	20.0	0.0	–
	Yan et Al. [94]	–	–	10	70.0	30.0	0.0	–
	Chau et Al. [53]	0.45	0.62	10	60.0	40.0	0.0	13
	Ours ( Proposed approach + [53] )	0.47	0.65	10	70.0	30.0	0.0	7
TUD-Crossing	Tang et Al. [84]	–	–	–	53.8	38.4	7.8	–
	Chau et Al. [53]	0.69	0.65	11	46.2	53.8	0.0	14
	Ours (Proposed approach + [53] )	0.72	0.67	11	53.8	46.2	0.0	8

In this work, we propose a new approach to improve the tracking quality by a global tracker which merges all tracklets belonging to an object in the whole video. Particularly, we compute online descriptor reliability over time based on their discrimination. Based on the computed discriminative descriptor weights, the global matching score over descriptors of 2 tracklets is given. Then, we apply Hungarian algorithm to optimize tracklet matching. On the other hand, a motion model is also combined with appearance descriptors in a flexible way to improve the tracking quality. Figure 12 shows the visual explanation . In frame 137, two objects have similar appearance but move with different direction. In this case, motion descriptor is more reliable. Inversely, in frame 553, two objects go consistently together but their coat and hair's colors are different. Therefore, the appearance descriptors are more reliable than motion one.

The proposed approach gets results of tracker in [53] as input and is tested on challenge datasets. The comparable results of this tracker with other trackers from the state of the art are shown in Table 6 . This paper is accepted in PETs workshop [41] .

Previous |

Home | Next next